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§ I Abstract 

■ Radio-astronomical observations are increasingly contaminated by interference, and suppression techniques become essential. 
A powerful candidate for interference mitigation is adaptive spatial filtering. We study the effect of spatial filtering techniques on 
radio astronomical imaging. Current deconvolution procedures such as CLEAN are shown to be unsuitable to spatially filtered 
data, and the necessary corrections are derived. To that end, we reformulate the imaging (deconvolution/calibration) process as a 

■ sequential estimation of the locations of astronomical sources. This not only leads to an extended CLEAN algorithm, the formula- 
' tion also allows to insert other array signal processing techniques for direction finding, and gives estimates of the expected image 
. quality and the amount of interference suppression that can be achieved. Finally, a maximum likelihood procedure for the imaging 
' is derived, and an approximate ML image formation technique is proposed to overcome the computational burden involved. Some 

<^ of the effects of the new algorithms are shown in simulated images. 

^\ Keywords: Radio astronomy, synthesis imaging, parametric imaging, interference mitigation, spatial filtering, maximum hkeU- 
i hood, minimum variance, CLEAN. 
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00 ' L Introduction 

o ; 

, The future of radio astronomical discoveries depends on achieving better resolution and sensitivity while maintaining immunity 
to terrestrial interference which is rapidly growing. The last two demands are obviously contradicting as improved sensitivity 
implies receiving more interfering signals. One possible track is to switch to massive phased array technology. If instead of the 
huge dishes which became the trademark of radio astronomy, we use phased array radio-telescopes comprised of tens of thousands 
' of small elements, then we gain both in terms of resolution and sensitivity while increasing the flexibility to mitigate interference. 
5^ ' The international effort in this direction is coordinated under the framework of the Square Kilometer Array project (SKA). In this 
I paper we try to analyze the effect of on-line interference rejection on the image formation process in such an instrument. 

We briefly describe the current status of radio astronomical imaging; for a more extensive overview the reader is referred to 
^ P^ ] or p6[|. The principle of radio interferometry has been used in radio astronomy since 1946 when Ryle and Vonberg 
constructed a radio interferometer using dipole antenna arrays [^9|]. During the 1950's several radio interferometers which use 
rN the synthetic aperture created by movable antennas have been constructed. In 1962 the principle of aperture synthesis using earth 
rotation has been proposed [ |30[ |. The basic idea is to exploit the rotation of the earth to obtain denser coverage of the visibility 
domain (spatial Fourier domain). 

The first instrument to use this principle was the five kilometer Cambridge radio telescope. During the 1970's new instru- 
ments with large aperture have been constructed. Among these we find the Westerbork Synthesis Radio Telescope (WSRT) in the 
Netherlands and the Very Large Array (VLA) in the USA. Even these instruments subsample the Fourier domain, so that unique 
reconstruction is not possible without some further processing known as deconvolution. The deconvolution process uses some 
a-priori knowledge about the image to remove the effect of "dirty beam" side-lobes. 

Two principles dominate the astronomical imaging deconvolution. The first method was proposed by Hogbom Jl^ ] and is 
known as CLEAN. The CLEAN method is basically a sequential Least-Squares (LS) fitting procedure in which the brightest 
source location and power are estimated. The response of this source is removed from the image and then the process continues to 
find the next brightest source, until the residual image is noise-like. During the years it has been partially analyzed [|3l|], [ |3^ ] and 
D31]. However full analysis of the method is still lacking due to its iterative nature. 



A second root proposed by Jaynes 1 13 1, 1 14| is maximum entropy deconvolution (MEM). The basic idea behind MEM is the 
following. Among all images which are consistent with the measured data and the noise distribution not all satisfy the positivity 
demand, i.e., the sky brightness is a positive function. Consider only those that satisfy the positivity demand. From these select the 
one that is most likely to have been created randomly. This idea has also been proposed in ^] and applied to radio astronomical 
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imaging in [111]]. Other approaches based on the differential entropy have also been proposed ^ and [p3|]. An extensive collection 
of papers discussing the various methods and aspects of maximum entropy can be found in the various papers in [^7|]. 

The above mentioned algorithms assume perfect knowledge of the instrumental response (point spread function). Due to 
various internal and external effects this assumption holds only approximately. One way to overcome this problem is the use 
of calibrating sources. An unresolved source with known parameters is measured, and by relating the model errors to the array 
elements a set of calibration equations is solved. A much more appealing solution is to try to improve the fitting between the data 
and the sky model by adjusting the calibration parameters. Another possibility is to use the redundant structure of the array to 
solve for the calibration parameters (this is possible only for some arrays which have redundant baselines, such as the WSRT). A 
good overview of the various techniques is given in 1^]. 

A major problem facing radio astronomy is the accelerated use of the electro-magnetic spectrum. Even in bands which are 
reserved to radio astronomical observation one can find interference, e.g., sidelobes of emissions from the Iridium or GLONASS 
satellites. As was shown in interferometric arrays are less sensitive to interference than single dish instruments. However the 
interference still appears in images, especially for observations at frequency bands not specifically reserved for radio astronomy. 
Many efforts all over the world are currently put into improving the interference mitigation capabilities of radio telescopes. The 
methods considered span over all possible domains: temporal p^], [[Tc|], [^, spatio-temporal [[l^], spatio-spectral [|T]], [p^, 
[jlTl], and wavelet based methods p^]. Also considered are techniques which use statistical and deterministic properties such as 
non-Gaussianity and constant modulus of the interferers in order to use blind-beamforming, to remove the interferers. 

As far as multi-element synthesis imaging radio telescopes are concerned, the spatial methods are extremely appealing, since 
each interferer has its own "spatial signature". Estimating these signatures enables efficient mitigation of the interferer using 
phased-array techniques. However two important questions shade over these methods. As we will show in this paper, applying 
time varying spatial filters makes the point spread function space varying. In particular the image formation techniques will have 
to be modified accordingly. To cope with this problem we reformulate the classical Fourier imaging framework in a parametric 
manner more appropriate for statistical analysis of the interference mitigation. This enables us to incorporate the spatial filtering 
naturally into the image formation process. 

Our reformulation of the image formation problem describes the measurements as a set of covariance matrices, measured at 
the various observation epochs. This yields a model where the array response is time varying. Previous research on time varying 
arrays and their application to direction-of arrival (DOA) estimation includes |^], where a maximum likelihood estimator for a 
single source and the corresponding Cramer-Rao Bound (CRB) are derived. For multiple sources proposes three approaches: 
A modified beamforming, a virtual interpolated array and focusing matrices. The main drawback of the last two methods is that 
they do not lend themselves to estimating more sources than the number of physical sensors. In [ ^4[ ] a generalized LS (GLS) 
approach is proposed. The idea is to present the problem of estimating the source powers (given the DOA parameters) as a linear 
problem. This enables a closed form solution for the powers. Substituting back into the GLS estimators, the problem is reduced to 
a multidimensional search problem over the DOA's. The main drawback of this approach is the need to invert very large matrices 
(p^ X p^, where p is the number of physical sensors). It is also shown that the asymptotic performance of the GLS and the maximum 
likelihood estimators is identical up to second order, which makes it very attractive. All these papers deal with the case of a fully 
known array response, and are limited to one-dimensional location parameters. In this paper we also extend the above methods in 
several directions. First we use the eigenstructure, to mitigate strong interferers for which we do not know the array response, and 
analyze the efficiency of this procedure. Second we propose other relatively "low" complexity approaches, based on eigenstructure 
and Minimum Variance Distortionless Response (MVDR). Finally we discuss a maximum likelihood approach to the astronomical 
image formation problem, derive an approximate ML estimator (AML) of low complexity and discuss the relation of the AML to 
the CLEAN algorithm. Our treatment of the radio-astronomical imaging can be applied to other problems of imaging with time 
varying sensor responses such as ISAR/SAR radar imaging in the presence of strong jammers. The model then generalizes the 
model given in [|5]| to the time varying context. For the case of an array with constant time behavior (connected element array) 
methods of computing the MLE have been proposed in [ ^5| ] using the EM method or |Q] using the SAGE algorithm. However, there 
is no straightforward extension of these methods to our context. Although an EM algorithm can be applied, the dimensionality of 
the parameter space makes it infeasible to perform full MLE without very good initialization. The paper contains an extensive 
overview on image formation principles and algorithms, as well as previous work on parametric image formation in other fields 
such as radar and tomography. 

To allow the paper to be of use both to the information theory/signal processing and to the radio astronomical communities 
the introductory part is of a tutorial nature. The structure of the paper is as follows. In section ^ we describe the astronomical 
measurement process and introduce an often-used coordinate system. The measurement equation is subsequently rephrased in 
a more convenient matrix formulation in section and extended with the effect of interference. In section |^ we describe 
several basic spatial filtering approaches to on-line interference suppression, and compute the residual interference after adaptive 
estimation of its parameters for one specific case. In section and VI we discuss the image formation process, first based on 
classical techniques (CLEAN), then extended to other beamforming methods and taking the spatial filtering into account. Finally, 
we derive an approximate ML algorithm for image formation. We end up with conclusions regarding future implementation of 
on-Hne interference suppression in radio-astronomy. 
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II. Astronomical measurement equations 

In this section we describe a simplified mathematical model for the astronomical measurement and imaging process. Our 
discussion follows the introduction in [|6|]. We begin with the measurement equation, to reformulate it into a matrix form in the 
next section. This will allow us to obtain a uniform description of various astronomical imaging operations such as deconvolution 
and self-calibration. 

The signals received from the celestial sphere may be considered as spatially incoherent wideband random noise. It is possibly 
polarized and perhaps contains spectral absorption or emission lines. Rather than considering the emitted electric field at a location 
on the celestial sphere, astronomers try to recover the intensity (or brightness) //(s) in the direction of unit-length vectors s, 
where / is a specific frequency. Let Ef{r) be the received celestial electric field at a location r on earth (see figure [l|(a)). The 
measured correlation of the electric fields between two identical sensors i and j with locations and rj is called a visibility and is 
(approximately) given by [|6||[] 

J sky 

(E{ • } is the mathematical expectation operator, the superscript ^ denotes the transpose of a vector, and overbar denotes the 
complex conjugate.) Note that it is only dependent on the oriented distance — between the two antennas; this vector is called 
a baseline. 

For simplification, we may sometimes assume that the astronomical sky is a collection of d discrete point sources (maybe 
unresolved). This gives 



1=1 

where s; is the coordinate of the /'th source, and thus 

d 

1=1 

Up to this point we have worked in an arbitrary coordinate system. For earth rotation synthesis arrays, a coordinate system is 
often introduced as follows. We assume an array with antennas that have a small field of view and that track a reference source 
direction Sq in the sky. Other locations in the field of view can be written as s = Sq + <t , Sq -L cr (valid for small cr) and a natural 
coordinate system is 

So = [0, 0, 1]^ , <T = [e, m, of . 
Similarly, for a planar array, the receiver baselines can be parameterized as 

- = X[u, V, wf , J ' 

The measurement equation in {u, v, w) coordinates thus becomes 

///,R-.)e— .«,„^ ,2) 

The factor e^'^^^^ is caused by the geometrical delay associated to the reference location, and can be compensated by introducing 
a slowly time-variant delay (see figure |l](&)). This synchronizes the center of the field-of-view and makes the reference source 
location appear as if it was at the north pole. After compensation, we arrive at a measurement equation in (u, v) coordinates only, 

Vfiu,v) = // //(£,m)e-2'^J("^+™)d^dTO. (3) 



It has the form of a Fourier transformation. 

The function Vf {u, v) is sampled at various coordinates (u, v) by first of all taking all possible sensor pairs i, j or baselines 
— rj, and second by realizing that the sensor locations r^, rj are actually time-varying since the earth rotates. Given a sufficient 
number of samples in the (u, v) domain, the relation can be inverted to obtain an image (the 'map'), which is the topic of section 

0- 

^To simplify notation we do not include in our model the directional response of the elements of the radio telescope. This can be included in a 
straightforward manner like in |{26|] chapter 1 section 4.3 
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III. Array signal processing formulation 

A. Obtaining the measurements 

We will now describe the situation from an array signal processing point of view. The signals received by the antennas are 
amplified and downconverted to baseband. A time-varying delay for every antenna is also introduced, to compensate for the 
geometrical delay. Following traditional array signal processing practices, the signals at this point are called Xi{t) rather than 
Ef{r), and are stacked in vectors 

x(t) = [xi{t), . . . ,Xp{t)f , 

where p is the number of antennas. These are then processed by a correlation stage. 

It will be convenient to assume that x{t) is first split by a bank of narrow-band sub-band filters into a collection of frequency- 
components x/(i). The main output of the telescope hardware is then a sequence of empirical correlation matrices R/(t) of 
cross-correlations of x/(t), for a set of frequencies / € {fk} covering a 10 MHz band or so, and for a set of times t g {tk} 
covering up to 12 hours ^ Each correlation matrix R/(t) is an estimate of the true covariance matrix R/(i), 



1 

Rfit) = E{x^.(i)x/(i)«} , Rfit) = ^ E ^/(^ + "^)^/(^ + "^)" ' 



n=0 

where the superscript " denotes a complex conjugate transpose, T is the sample period of x/ {t) and N is the number of samples 
over which is averaged. The matrices R/(t) are stored for off-line spectral analysis and imaging. Typically, each sub-band has a 
bandwidth in the order of 100 kHz or less. Due to the sub-band filtering, the original sampling rate of x(t) is reduced accordingly, 
resulting in T in the order of 10 /is and the number of samples N in the order of 10^ for each sub-band. / represents the 
center frequency in a sub-band. From now on we consider the sub-bands independently ignoring that they are really connected. 
Consequently, in future equations we drop the dependence on / in the notation. 

The connection of the correlation matrices R(t) to the visibilities V{u, v) in section [| is as follows. Each entry (t) of the 
matrix R(t) is a sample of this visibility function for a specific coordinate {u, v) corresponding to the baseline vector (t) — rj {t) — 
X [uij (t ) , Vij (t) , Wij {t)] between telescopes i and j at time t: 

V{u^j{t),v^j{t)) = r,,{t). (5) 

Note that we can obtain only a discrete set of {u, v) sample points. Indeed, the number of instantaneous independent baselines 
between p antennas is at most ^p{p — 1). Also, using the earth rotation, we have a finite set {t^, fc = 1, • • • , K}, where the 
number of epochs K is given by the ratio of the observation time and the covariance averaging time (e.g., K ^ 12 h/30 sec = 1440 
samples). The available sample coordinates {uy,/c, %,/c} give an irregular cover of the (u, v) plane. For an East- West line array 
such as WSRT, the points lie on ellipses. A practical issue is the implementation of the geometrical delay compensation. It is 
usually introduced only at the back end of the receiver. At this point, also a phase correction is needed to compensate for the factor 
g-27rj'i«ij (t) j.jjg measurement equation This is referred to as fringe correction [^. Since the earth rotates, Wij {t) is slowly 
time varying, with a rate of change in the order of 0-10 Hz depending on source declination and baseline length. 

B. Matrix formulation 

For the discrete source model, we can now formulate our measurement equations in terms of matrices. Let Yo{tk) be an 
arbitrary and time-varying reference point, typically at one of the elements of the array, and let us take the (u, w, w) coordinates of 
the other telescopes with respect to this reference, ri(t) — ro(i) = A[uio(t), ViQ{t), WiQ{t)], i = 1, ••• ,p . Equation (jl]) can 
then be written slightly differently as 

d 

1=1 
d 

i=x 

In terms of correlation matrices, this equation can be written as R^ = A^BA", where R^ = R(tfe), 

Afc = [afe(^i,TOi), . . . ,SLk{ld,rnd)] 

Many telescope sites including WSRT follow actually a different scheme where the signals are first correlated at several lags and subsequently 
Fourier transformed. This leads to similar results. 
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and 



aLk{i,m) 



B 



p-2Trj(tiio(tfc)f+i;io(tfe)"i) 
^-2TTj{Upo(tk)e+Vpo{tk)m) 



(6) 







The vector function {£, m) is called the array response vector in array signal processing. It describes the response of the telescope 
array to a source in the direction (£, m). As usual, the array response is frequency dependent. In this case, the response is also 
slowly time-varying due to the earth rotation. Note, very importantly, that the function as shown here is completely known. 

More realistically, the array response is less perfect. An important effect is that each antenna may have a different complex 
receiver gain, 7i(t), dependent on many angle-independent effects such as cable losses, amplifier gains, and (slowly) varying 
atmospheric conditions. We also have to realize that most of the received signal consists of additive system noise. When this noise 
is zero mean, independent among the antennas (thus spatially white), and identically distributed, then it has a covariance matrix 
that is a multiple of the identity matrix, a^I, where is the noise power on a single antenna inside the subband which we consider. 
Usually the noise is assumed to be Gaussian. The resulting model of the received covariance matrix then becomes 



R 



where 



k — 



Tk = 



r,.AfcBA«r^ 



7i,fc 




7p,fc 



(7) 



(8) 



Note that this assumes that the noise is introduced after the varying receiver gains. This assumption is reasonable if the channels 
from the low-noise amplifier (LNA) outputs to the analog-to-digital converter (ADC) units are equal. Otherwise, it is still reasonable 
to assume that the noise is spatially white, i.e., the noise covariance matrix is diagonal. We can assume that the receivers noise 
power can be estimated using various calibration techniques; a simple diagonal scaling will then bring us back to the model (^). 

C. RF interference 

Radio frequency interference (RFI) usually enters the antennas through the sidelobes of the main beam. It can be stronger 
or weaker than the system noise. An important property is that it has a certain directivity, so that it does not average out in the 
correlation process. Examples of harmful RFI are television broadcasts, geolocation satellites (GPS, GLONASS), taxi dispatch 
systems, airplane communication and navigation signals, wireless mobile communication (GSM) and satellite communication 
signals (Iridium). Thus, interferers may be continuous or intermittent, narrow-band or wideband, and strong or weak. 

Suppose that we have a single interferer impinging onto the telescope array. The interfering signal reaches the array with 
different delays for each telescope. Assuming processing in narrow sub-bands as before, delays translate into phase shifts,^] and 
the received signal can be modeled as Xi{t) = ai s{t) e^^'^^-f'^', or in vector notation. 



x(t) 



s{t) a.s{t) . 



Here, s{t) is the baseband signal, and represents the telescope gain in the direction of the interferer, including any possible 
attenuation of the channel. Unlike much of the array signal processing literature, the are likely to be different for each telescope 
since the interferer is typically in the near field. This implies that it impinges on each telescope at a different angle, whereas the 
response of the telescopes is not omni-directional. Hence, the corresponding array response vector a is now an unknown function. 
This vector is also called the spatial signature of the interfering source. It is slowly time varying, and we write a = a(t). 
Similarly, with q interferers. 



Slit) 



AS) = [ai(t),--- , a,(t)] . 



For this, the processing bandwidths should be much less than the inverse of the maximal delay. For example, in WSRT the largest baseline is 
3000 m, corresponding to a maximal delay of 10 /is. Hence the narrow-band assumption holds for bandwidths less than 100 kHz ||2l|]. 
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The subscript 's' is used to distinguish {t) from the array response matrix of the astronomical sources. 
The corresponding correlation matrix at time tk is 

Rk = E{x(tfc)x(ifc)"} = (A,)fc(R,),.(A^)fc. 

The q x g-matrix (Rs)fe = E{s(tfc)s"(tfc)} depends on the correlational properties of the interfering signals. For independent 
interferers, it will be a diagonal matrix, with the q interfering powers on the diagonal. 

How well an empirical estimate fits to R^ depends on the stationarity of the scenario, and is open to discussion. For various 
reasons (mobile interferers with multipath fading, fixed interferers such as TV stations moving through the varying sidelobes of the 
rotating telescopes, fringe corrections of up to 10 Hz), the stationarity of is often limited to about 10-100 ms. In the rest of the 
paper, we make the assumption that indeed the available R^ are obtained over stationary periods. In summary, the overall model 
including astronomical signals, array imperfections, interference and noise is given by: 

Rfe = TfeAfeBA^rfe + (A,)fc(R,)fe(A,)^ + a^I, k = 0,l,---. (9) 

where we assume that the interference term A^ is unstructured, and rank(As) = q < p. 

IV. Spatial filtering 

An online interference mitigation system will consist of two stages. As a first step the presence of interference is detected. 
This part is considered in [ p^ and demonstrated on astronomical data in [|o|]. In the case of continuous interference it is 
reasonable to use its spatial signature in order to remove it. This leads to spatial filtering techniques. 

A. Projecting out the interferer 

Let us assume that we have obtained a covariance matrix R, which contains the rather weak covariance matrix of the astro- 
nomical sources (visibilities) R^ — rfcAfcBA^Ffc, plus white noise. [| Suppose also that there is an interferer with power a^: 

R = R„ + ala.a^ + cr'^I. 

Assuming that a is known it is possible to null all energy with spatial signature a. To this end, we can introduce the projection 
matrices 

P/ H N— 1 H nJ_ T / H N— 1 H 

a = a(a a) a , =1 a(a a) a . 

It is easily seen that a = 0, so that if we apply P^ as a spatial filter to R, we obtain 

R := P^RP^ = P^R.P^ + <T^Pi, . (10) 

Thus, the interference is completely removed. At the same time, the visibility matrix is modified by the projections, and the noise 
is not white anymore, since one dimension is missing. The imaging stage has to be aware of this, which is the topic of section 0. 

This idea is also applicable to multiple narrowband interferers, and we do not need to know the spatial signatures of the 
interferers in advance. Indeed, if the total number of interferers inside a subband is less than q < p, an eigenvalue decomposition 
allows to estimate the corresponding "interference subspace" spanned by the spatial signatures from the data covariance matrix, 
and subsequently project out this subspace. 

Thus let R — UAU'* be the eigendecomposition of R. For the purpose of interference cancellation we assume that the sky 
sources are weak: R„ ^ a^I, and thus their influence can be ignored in the eigendecomposition. Let U = [Ug U„] where 
is p X q and contains the eigenvectors corresponding to the q largest eigenvalues, and U„ collects the remaining eigenvectors. In 
the noise free case, R has rank q and R — A^RgA^ — UsAU^ . In the noisy case, R = A^RgA" + ct^I with eigenvalue 
decomposition 



R = U,A3U^4 




[U, u„ 


H 






= [U, U„] 


" As + a% 







[ u« 1 


(11) 














Therefore, the smallest eigenvalue (ti^) has multiplicity p — q, and 

span(Us) = span(A,) , U^A, = . (12) 
*In this section we consider a single covariance matrix so we drop the index k. 
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We refer to Ug as the interference subspace. According to dl^), -L U„, so that P;^ = U„Uj^. Thus, the eigenvalue 
decomposition of R allows us to detect the number of interferers (from the number of repeated small eigenvalues) and to identify 
the projection matrix P;^ = U„U" to project them out, as in (|lo|). Note that we do not have to know A^. This hinges upon 
the fact that the noise covariance is white (in general: known), and the visibility matrix R„ is insignificant at these time scales 
(otherwise, it might disturb the eigenvalue decomposition). 

In practice, we only have a sample estimate R of R. The eigenvalue decomposition of this matrix, 

r = u,a,u^ + u„a„uj:. 

gives a maximum likelihood estimate of U„ [^. 

One might be worried that if we use the estimated subspace for the projections, it might leave correlated residual interference 
components that eventually will show up in the final image. This is in fact not the case, as we will now demonstrate that the residual 
interference is spatially white. 

B. Residual interference after projections 

A perturbation analysis of the eigenvalue decomposition allows to obtain asymptotic expressions for the residual interference 
in the covariance matrices after spatial filtering. To this end, we utilize the following theorem from |WQ], a proof of which is given 



in [361 



Theorem IV.l: Let R be the sample covariance matrix based on N samples of a p— dimensional complex Gaussian random 
process, with zero mean and covariance R. Let R = UAU" be an eigenvalue decomposition of R, with U = [ui , . . . , Up] 
unitary, and A = diag[Ai, • • • , Ap], where Ai > . . . > Ag > tr^, and A^+i = ■ ■ ■ = Xp ~ a^. Let Us — [ui, . . . , u,], then for 
m > qwe have that Pu^ Um is asymptotically a zero-mean Gaussian random process with variance determined by 



E{(Pu^uO(Pu.iijn-^ 



An 



^ (A„ - (7^y 
_n— 1 ^ ^ 

normalized to ||a|p = p. In this case, Ai = paj + a^, Ui = a/||a|| = a/ and (jlj) gives 



5,^j + o(iV-i), i,j>q. (13) 



For simplicity, let us specialize to the case of g = 1 narrowband interferer, with power per sample and spatial signature a 



Now note that II Pu.u^f = |lui<u„f ^ IKu^f = HumU^^Uif = HPfl^Uif , so that 

E{||Pa„a||2} = pE{||Pfl„Ui||2} = pE {||Pu,u^f } . 

Similarly we get for m 7^ n, E{(Pu„Ui)''(Pq„Ui)} = 0. 

If we define the input Interference to Noise ratio (INR) as INR — ui/ a^, then we finally obtain 

fiiTi ii2i a'^pal^a'^ ppINR+1 
E{llP«„a||^} = f-v^^ = AF^^lNRjr- '» > 1 (>4) 



Let us assume that the estimate ai is approximately independent of the interfering signal. The residual INR at the output after 
spatial filtering is then 

, E I II pJ- all 2 1 1 / 1 \ 

INR' ^ ^" " ^ = A 1 + (16) 
0-2 p-\ N \ 39 INR J 



These expressions are very satisfactory. Indeed note from ( |14| ) that the expected residual interference power is the same in each 
of the directions Um, which together form an orthonormal basis of the projected space. This means that the residual interference is 
spatially white within the projected space (up to second order (in -^) effects), and only increases the effective noise power without 
adding spatial features to it. The effective noise power at the output is 

(a'f =a^ll + l(l + ^— 
^ ' \ N \ pINR 
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Figure § shows the residual interference in a simulation, for N — 100 samples and p = 8 antennas. The reference lines are 
given by the predicted value in equation (|l6|), and the line INR' = INR. Although the pred icted value fits very well for sufficiently 



large INR, it is seen that for small INR, ma) loses its validity. This is because theorem [V. 1 is valid only for eigenvalues sufficiently 
above the noise power For small INRs, the estimated interference subspace will be a random vector, and the projection will have 
no effect on the INR. The cross-over point is approximately given by INR = ^ + -^^^ ■ The generalization for higher dimensional 
interference subspace is straightforward using the orthogonality of the interference eigenvectors. Note however that in this case the 
eigenvectors lose their natural interpretation as spatial signatures of the various interferers. 

C. Other spatial filtering possibilities 

Without going into too much detail, we mention a few other possibilities for spatial filtering and interference cancellation. 
Suppose there is a single interferer, 

R = R„ + cr? aa'' + ct^I . 



- Subtraction. With an estimate of a and an estimate of its power, we can try to subtract it from the covariance data: 

R = R-(Tfaa". (17) 

Without other knowledge, the best estimate of a is the dominant eigenvector, Ui, of R, and likewise the best estimate of is 
Ai — a^. Since both of these are derived from R, it turns out to be not too different from the projection scheme. Indeed, if we 
look at 



(I — auiu")R(I — auiu") = R — Uiu"Ai(2a — a'^) 

we can make it equal to ( p7[ ) by selecting a such that Ai (2a — a^) = (t^. The projection scheme had a = 1. 

- Spatial whitening. In this scheme, we try to make the interference plus noise white again. This component is equal to cr^ aa^ + 
(T^I, so we pre- and post-multiply with square-root factors of it: 

R - (a2aa"+a2l)-V2R(^2p^a«+a2l)-i/2 

- Subtraction of a reference signal. If we have a reference antenna that receives a 'clean' copy of the interfering signal, then we 
might try to subtract this reference signal from the telescope signals. There are many adaptive schemes for doing so, e.g., LMS 
or RLS. This solution involves a separate receiver for each interferer. To shorten the filter lengths sub-band processing is 
recommended, as otherwise the adaptation might be slow for wideband signals. 

This filtering scheme is similar to the first-mentioned subtraction scheme, except that the spatial signature a of the interferer is 
computed from correlations with the reference antenna. 

Note that each of these filtering schemes can be described as a linear operation on the entries of the observed data covariance 
matrix R^. In future equations, we will denote the linear operation by L^, and the output after filtering R^ = L^RfeL^. 

V. Fourier imaging after spatial filtering 

In the previous sections, we discussed spatial filtering techniques. It was shown that an attractive scheme for removing the 
interference is by projecting it out. However, by doing so we replace the observed visibilities V{ui, Vi) in the matrix R^, by some 
(known) linear combination. In this section, we discuss the implications of this for the imaging. 

A. Classical inverse Fourier imaging 

The relation between sky brightness I{£, m) and visibilities V{u, v) (where u, v are taken at frequency /) is 

We have measured V on a discrete set of baselines {{ui, Vi)}. The "dirty image" (a lumpy image obtained via direct Fourier 
inversion possibly modified with some weights a) is defined by 

lD{l,m) Qn".,«*)e'"^("''+'""^ (18) 
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It is equal to the 2D convolution of the true image / with a point spread function known as the "dirty beam": 



= JJ I{£',m') Bo{i^ r,m~m') de dm' 



dt' dm' 



or 



Id = / * -Bo 



So is the dirty beam, centered at the origin. The weights {ci] are arbitrary coefficients designed to obtain an acceptable beam- 
shape, with low side lobes, in spite of the irregular sampling. 

Specializing to a point source model, /(^, m) = /; 5{l — £i,m — mi) where // is the intensity of the source at location 
{ii,mi), gives 



V{u,v) 



iDie,m) 



I 



h Bo{£ - £i,m- mi) 



Thus, every point source excites the dirty beam centered at its location {£i, mi). 

From the dirty image Id and the known dirty beam Bq, the desired image / is obtained via a deconvolution process. A popular 
method for doing this is the CLEAN algorithm [|l2|]. The algorithm assumes that Bq has its peak at the origin, and consists of a loop 
in which a candidate location (£/ , m/ ) is selected as the largest peak in Id, and subsequently a small multiple of Bo{£ — £i, m — mi) 
is subtracted from Id- The objective is to minimize the residual, until it converges to the noise level. A short description of the 
algorithm is given in table |. The parameter 7 < 1 is called the loop gain and serves the purpose of interpolation over the grid. A; 
is the estimated power of the source. 

B. Inverse Fourier imaging after projections 

If we take projections or any other linear combination [aj] of the visibilities {V{ui, Vi)} during measurements, as in section 
|rv| , we have instead available 



Z{ui,Vi) = ^ Ci 
3 



Viuj.Vj) 



The coefficients Cij are connected to the linear operations {Lk\ of section IV, and Z{ui,Vi) are the samples contained in the 
collection {Rfc}. 

Suppose we compute the dirty image in the same way as before, but now from Z, 

iDie.m) := Z(«„t;,)e2-J("'^+-'™) 



Then 



Id{£, m) 



E^Ej Cij \jj I{£',m') e-'^^3(u,li'+v,ra') 

n m'.m') [e.e, c 



where 



= // /(f , m') B{£, m, £', m') d£' dm' 
B{£,m,£',m') '-^ 



d£' dm' 



r-.. „-2TTj(uje'+Vjm') 2TT]{uil+Vim) 
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Thus, the dirty image is again obtained via a convolution, but the dirty beam is now space-varying. B{£,m,£' ,m') is a beam 
centered at (£', to') and measured at {£,m). 
With a point source model, 

m,£i,mi) = ^ h Bi{£,m) 



where 



i 3 



, p-2Trj(uj<?i+Ujmi) 2-n:](uit.+Vim) 



Again, every point source excites a beam centered at its location (£; , to;), but the beams may all be different: they are space varying. 
Nonetheless, they are completely known if we know the linear combinations that we took during observations. Thus, the CLEAN 
algorithm in table | can readily be modified to take the varying beam shapes into account: simply replace i3o(^, fn) by Bi{£^ to) 
everywhere in the algorithm. Some remaining issues are 

1 . It is not a priori guaranteed that the main peak of Bi {£, to) is indeed centered at {£i ,mi). 

2. The noise is not necessarily white, and the coloring should be taken into account. 

3. The computational complexity is increased since we have to construct Bi{£, m) for every point to;). 



The first two points are addressed in section VI-C. 

To demonstrate the spatial variation effect of the dirty beam, we have generated an unfiltered dirty beam Bo{£, to) (figure ||), 
and the beams B{£, to, £' ,m') that would result after projecting out an interferer with a fixed terrestrial location. We show the latter 
for a source located at the center of the field, i.e., B{£, to, 0, 0) (figure Q), and for a source at (40", 30"), i.e., B{£, m, 40", 30") 
(figure ^.[] Note that the spatial projections have modified the shape of the beam, in particular the sidelobes, and that the response 
is not constant but varies with the location of the source. Although the changes do not look very dramatic, the differences are in 
fact important: accurate knowledge of the beam shapes is essential in the deconvolution step, especially if weak sources are to be 
detected among sources that are orders of magnitude stronger. 

VI. Imaging via beamforming techniques 

In this section, we reformulate the classical inverse-Fourier imaging technique and the CLEAN algorithm for deconvolution in 
terms of a more general iterative beamforming procedure. This is possible since we have a parametric point-source model, and the 
prime objective of the deconvolution step is to estimate the location of the point sources. The interpretation of the deconvolution 
problem as one of direction-of-arrival (DOA) estimation allows access to potentially a large number of algorithms that have been 
developed for this application. 

A. CLEAN and sequential beamforming 

We set out by showing how CLEAN can be interpreted as a sequential beam-forming procedure. 

Let us assume that we have available a collection of measured co variance matrices Rfe, obtained at times tk with fc = 1, • • • ,K, 
and let us assume the parametric model of (^, i.e., 

Rfc = AfcBAfc +(72l. 

Here, the unknown parameters are the source locations s; — {£i, mi), I = 1, - ■ ■ ,dm each of the Afe, and the source brightness J; 
in B. A natural formulation for the estimation of these parameters is to pose it as the solution of a LS cost function, given by 

K 

[{si},B] = argmin V || - Afc({s;}) BA^({s;}) - a^l \\p (19) 

{s,},B 

(B is constrained to be diagonal with positive entries.) This is recognized as the same model as used for DOA estimation in 
array processing. Note however that the array is moving (A^ is time-dependent), and that there are many more sources than the 

dimension of each covariance matrix. 

In this notation, the image formation in section V-A can be formulated as follows. Recall from (|]) and that 



V{uij{tk),v^j{tk)) = i\j{tk) = (Rfc)ji , k^l,---,K 

^ The beams have been computed for {ui,Vi) samples corresponding to a WSRT antenna configuration with p — lA telescopes and K = 100 
covariance epochs over 12 hours. 
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afc (^,m) 



-2Tr](uia{tk)e+vio(tk)m) 



,-27rj{upa{tk)t+VpQ{tk)m) 



Uio - Ujo , 

ViO - Vjo 



Thus, if we write Id{s) = loi^, m) and afc(s) = SLk{(-i m), we can rewrite the dirty image ( |18| ) as 

= Efe a«(s)Rfeafc(s). 

(We omitted the optional weighting. Also note that, with noise, we have to replace by R^ — a^I.) The iterative beam removing 
in CLEAN can now be posed as an iterative LS fitting between the sky model and the observed visibility pi]]. Finding the brightest 
point So in the image is equivalent to trying to find a point source using classical Fourier beamforming, i.e., , 



K 

argmax^afe(s) 

" k=i 



<y^i) afc(s) 



(20) 



Thus, the CLEAN algorithm can be regarded as a generalized classical sequential beamformer, where the brightest points are found 
one by one, and subsequently removed from R^ until the LS cost function ( [l9| ) is minimized. An immediate consequence is that 
the estimated source locations will be biased: a well known fact in array processing. When the sources are well separated the bias is 
negligible compared to the standard deviation, otherwise it might be significant. This gives an explanation for the poor performance 
of the CLEAN in imaging extended structures (see e.g., [|6l]). Another enhancement in the CLEAN can be made by turning it into 
an iterative scheme rather than sequential. In this case after estimating all point sources, we put one source at a time back into 
the data, and re-estimate it, using the estimates of the other point sources. This will improve the LS fit, and can easily be proved 
to converge. This approach is similar to the alternating projections approach for computing the deterministic ML DOA estimator 
0]. 

B. Minimum variance beamforming approaches 

Once we view image formation/deconvolution as equivalent to direction-of-arrival (DOA) estimation with a moving array, we 
can try to adapt various other DOA estimators for handling the image formation. In particular the deflation approach used in the 
CLEAN algorithm can be replaced by other source parameters estimators. One approach that seems particularly relevant in this 
context is the Minimum- Variance Distortionless Response (MVDR) method of beamforming [^. The major new aspect here is the 
fact that the array is moving and that there are more sources than sensors. 

Instead of working with the dirty image /_d(s) = a)!(s)Rfeafe(s), the basis for high-resolution beamforming techniques 
is to look at more general "pseudo-spectra" 



loi^) ■= Wfe(s)RfeV^ffe(s) 



(21) 



Here, Wfe(s) is the beamformer pointing towards direction s, and /^(s) is the output energy of that beamformer Previously we 
had Wfe(s) = afc(s); the objective is to construct beamformers that provide better separation of close sources. 

A generalized MVDR follows by defining the problem as follows: At each time instance k we would like to generate a weight 
vector Wfc which minimizes the output power at time k subject to the constraint that we have a fixed response towards the look 
direction s of the array, i.e.. 



Wfc(s) = argmin w^RfcWfc such that w^afc(s) = l. 



The solution to this problem is 



Wfc = /3fcR^ afc(s) , where Pk = — 



Inserting in ( |2l| ) shows that the overall spectral estimator is given by 



K 



a«(s)R^^afc(s) 



(22) 



h a)!(s)R^ia.(s) 



(23) 
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and the locations of the strongest sources are given by the maxima of (s). It is known that the MVDR has improved resolution 
compared to the classical beamformer which is the basis for the CLEAN algorithm. Figure ^ illustrates this by comparing a dirty 
image produced in the classical way to the dirty image corresponding to (|3]). In this simulation, we generated an extended structure 
by placing many point sources close to each other. The MVDR-based imaging produces a much sharper result. 

Compared to the CLEAN there is a slight computational loss. Another drawback of the MVDR is the fact that the noise 
distribution at the output of the beamformer is not identical towards all directions. Here we can use a modification ||^] which 
demands that Wfe has a functional form form = /3fcR^^afc(s), for some but now under the constraint that ||wfe|p = 1. This 
leads to the following estimator: 



K 



afc(s)Rfc^afc(s) 



^o(^) ^ >. H U-9 So = argmax /^(s). 



-{ a^(s)R^2afc(s) ' 



Many other techniques exist for estimating the point source locations. A good overview of the various possibilities can be found in 

C. CLEAN with spatial filtering 

Let us assume now that we have spatially filtered the covariance matrices Rfc by hnear operations Lfc, for example projections. 
If we assume that all the interference is removed by the filtering, the measurement equation becomes 

Rfc LfcRfcL^ = Lfc [Afc({s,})BA^({s,})+a2l] L^ (24) 

This modifies the least squares optimization problem to 

K 

{si},b\ = argmin ^ || (Rfc - Afc({s,}) BA^({s,}) - a^l) Ll\\p. (25) 
{s,}3,{r,}fe=i 

The cost function is similar to (^9|) and thus its minimization does not pose stronger computational demands. Indeed, as we 
mentioned before in section |V-B| , if we follow the classical Fourier-type imaging we end up with a deconvolution problem with a 
space-varying beam, but the CLEAN algorithm is simply extended to take this into account. Here, we develop the extension more 
carefully, taking note of the fact that the noise structure after projections is not white anymore. 

In the case of spatially filtered signals the classical beamformer follows from the previous by replacing afc(s) by the effective 
array response Lfcafc(s), i.e., 

^d(s) = Ef=iafe(s)L"(LfeRfcLfc - a^LkLl)Lka.k{s) 

= Ef=ia^(s)(L^LfcRfeLfcLfe - 'T2L«LfcL«Lfc)afc(s) (26) 
= Ef=ia;!(s)R'fcafc(s), 



where 



R-fc — LfcLfc Rfc LfcLfc — a^LfcLfc L^Lk (27) 



Therefore the step of finding the brightest point Sq in the image can be implemented using EFT in the same way it is implemented 
in the CLEAN algorithm, but acting on R'^ instead of the original visibilities. Similarly, the contribution of a source at location Sq 
in a single covariance matrix Rfc is a multiple of Lfcafc(so)a^(so)L^, and hence the response in the dirty image /^(s) is given by 



K 



B{s, So) := ^ al{s)Ll (Lfeafc(so)a5!(so)L)!) Lfcafc(s) . (28) 



fc=i 



This is the space-varying beam. The extended CLEAN algorithm after spatial filtering now follows immediately and is given in 
table 0- 

To test the algorithm, we have taken an array configuration with p = lA telescopes as in WSRT, and generated four equal- 
powered point sources centered around right ascension 32° and declination 60°, with a signal to noise ratio of —20 dB for each of the 
sources. To simulate the effect of spatial filtering, we placed an interferer at a fixed terrestrial location (hence varying compared to 
the look direction of the array), and with INR = 5 dB. K = 100 sample covariance matrices Rfc were generated, uniformly spread 
along 12 hours, and each based on = 1000 samples. Figure ^(a)-(c) shows the dirty image without interference present, the 
effect of the interferer on the dirty image, and the dirty image after estimating and removing the interferer using spatial projections. 
Clearly, with interference present but not removed, the sources are completely masked out (note the change in scale between the 
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first two figures). After estimating and projecting out the interferer, in the third image, we obtain nominally the same image as 
in the interference-free case, but the sidelobe patterns are different (as we demonstrated before, they are in fact space-varying). 
The circles in the third image mark the true location of the point sources, and the +-symbols mark the locations estimated by the 
extended CLEAN algorithm of table ||. We can clearly see that the correct locations have been obtained. This would not be the 
case with the unmodified CLEAN algorithm. 

D. Self-calibration 

To finish this section, we consider the situation where also the array gains Tk are unknown. In this case, the model fitting 
equation without spatial filtering, equation (|l9|), generalizes to 

K 

{sj},B,{ffc}l = argmin ^||Rfc-rfcAfe({s,})BA^({s,})r«-a2l||f (29) 

{s,},B,{r,}fe=i 

The solution can be obtained by the "self-cal" algorithm [^: an alternating least squares algorithm which solves iteratively for 
the parameters B, {s;} by a CLEAN step (with fixed gains), and the gain parameters {Tk] by a calibration step (with fixed source 
parameters B, {s;}). 

It has not been noted before in the Uterature that the latter step admits a direct algebraic solution. Indeed, to minimize ( p^ ) with 
fixed {Afc} and B, we can minimize separately for each k the related expression 

min \\Iik-TkAk-BAlTl-aH\\F (30) 

Let gfc be the vector of diagonal elements 7^. ^ of T^. Given an estimate of Aj. and B we can define for each k a {p x p) matrix 
Xfc with entries 

N _ (Rfc - cr'^I)ij 
^"^'^'^ ~ (A.BA^)., 

and fit gj. with entries g^ i such that (Xj.)ij = gk,igk,j ■ In the usual self-calibration algorithm, this equation is solved iteratively 
for all two-by-two sub-matrices of X^, using the so-called gain and phase closure relations. Instead, we note here that the problem 
admits a more elegant solution, since in matrix form, we have 

V H 

X/c = gkSk ■ 

This asks for the best hermitian rank-one approximation to the matrix X^,, which is known to be given by g^, = where Ai 

is the largest eigenvalue of Xfc and vi its corresponding eigenvector. 

More work is needed to generalize this to the model equation with spatial filtering. With fixed Ak and B, the minimization 
problem for the gain parameters is 

min ||L, (r, - r^AfcBA^r^ - a^l)Ll\\p (31) 

Let vec( • ) denote the operation of stacking all columns of a matrix into a vector, and let (g) denote the Kronecker product. A 
general property valid for matrices of compatible size is vec(ABC) = (C""^ (g) A)vec(B). Application to the present context gives 
that (|T]) asks for the solution in least squares sense of 

(Lfe®Lfe)vec(Rk-o-2l) = (L^ » L^) vec(rkAkBA;^rk) 

= (Lfe(g)Lfe)diag(vec(AkBA|^))(gkig)gk) 

which has the form 



hfe = FfeXfc , Xfc = gfe (g) gfe . 

If Ffe has a left inverse f|., we can find a unique LS solution for x^, i.e., x^ = Fj^h^, and then fit x^ = gfe (g gfe, or equivalently 
Xfe = gfcgfc, where X/j is derived from x/j by unstacking, i.e., vec(Xk) = Xk- This is precisely the solution which we obtained 
before. 

However, in the present case is a projection operator and hence F^ is not invertible. It is easy to see from examples, such 
as taking = [00]' ^^^^ ™ these cases Tk is not identifiable. A solution can be obtained if we make the reasonable assumption 
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that Ffe is constant over several epochs, say for fc, A; + 1, • • • , k + M — 1, and that is sufficiently varying over this period, for 
example due to multipath or fringe corrections. In that case, we obtain 









hfc+i 










Ffe+M-l 



Xfc , 



Xfe = gfe «) gfe . 



For sufficiently large M, the block matrix is tall and of full column rank, and has a left inverse. We can thus solve for an unstructured 
LS estimate ic^, and subsequently fit = (g) g^. 

The minimal number M of linearly independent matrices L^. which are needed follows from counting dimensions. If we 
project out q interferers on p antennas, L^^ has p — q independent rows and p columns. Hence F^. has {p — g)^ independent rows 
and p^ columns, and we need M{p — q)^ > p^. This gives modest requirements: if for p = 14 we take M = 2, then we can accept 
up to g = 4 interferers; with M = 4, up to g = 7. 

In summary, we have obtained an elegant and computationally non-intensive extension of the self-cal algorithm to complement 
the space-filtering CLEAN algorithm. 

VII. Maximum likelihood imaging 

A. Maximum likelihood functional 

Let us consider the imaging step from a more fundamental viewpoint. In principle, the construction of the image using the 
observed correlation matrices and assuming the parametric model can be viewed as a parameter estimation problem. One of the 
most important inference methods is the maximum likelihood method: Given a parametric family of probabilistic models for the 
received data, choose the parameters that maximize the probability of obtaining the observed data. This is different than the most 
probable image approach where no parametric model is imposed on the image, leading to maximum entropy image formation. 
Maximum likelihood estimators (MLEs) are known to be consistent and asymptotically statistically efficient (i.e., they provide 
unbiased estimators with minimum variance) under very general conditions, and thus are the natural choice for many parameter 
estimation problems. 

In deriving the maximum likelihood estimator of the image parameters, we need a parametric family of models for the astro- 
nomical signals. A reasonable assumption regarding the astronomical data is Gaussianity of the temporal samples^. This assump- 
tion is used in current imaging systems which rely only on second order statistics (both temporal and spatial). For simplicity we 
further assume that the samples are temporally white (valid for the relatively narrow bands processed, while over very large bands 
the black-body radiation pattern should be taken into account). For further discussion on emission mechanisms, and the resulting 
physical models of emission the reader is referred to [28|. Contrary to the claim in []3^], the corresponding MLE is not equivalent 
to parametric optimization of the CLEAN cost function. Using the discrete point source model we obtain: 



r,Afc({s,})BA;!({sz})r^ + a^l 



(32) 



where the d astronomical sources are Gaussian with covariance matrix B = diag[Ii , • • • ,Id] and sky coordinates {s;}f^j^, and 
the noise is Gaussian with covariance cr^I. Let R/j be the sample covariance matrix during the /c-th epoch, based on A^^ collected 
samples. The hkehhood of the observations at the A;-th epoch given map parameters {s;}, B, cr'^, F/j is then given by [|1|: 



p(Rfe|{sz},B,a2,Ffc) = 



TT^IRfc 



-tr(R-iRk) 



Using all K observation epochs we obtain that the log-likelihood function is given by (after omitting constants) 



£(Ri,... ,RK|B,{sz},{Ffc},a2) = 



K 

E 

fc=i 



K 



log|Rfc|-^7Vfctr(Ri^^Rk) 



(33) 



(34) 



k=l 



The MLE is found by maximizing ( p4[ ) over B, {s;}. {F/^}, a^. This maximization problem is prohibitively complex and hence 
some simplifications are needed. In some simplified cases in DOA estimation this has been dealt with. The Gaussian signals 
model for a static array with perfect calibration have been considered by [Q| which eliminated analytically some of the parameters. 
Derivation of the MLE for a single source in white Gaussian noise for the simplified model appeared in [|4^]. 

Since the CLEAN gives an approximate solution to the deconvolution problem we can use it to initialize a maximum likelihood 
search. In this case the CLEAN components serves as initial estimates to the MLE and the MLE serves the purpose of fine focusing 

^The assumption is valid for continuum emission, while for spectral line observations certain adjustments of the model are needed to include 
the lines structure. 
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of the image, by shifting each point source to its true value. The ML search itself can be done either using a gradient search, 
a Newton search based on the Fisher information matrix, or an EM algorithm. In appendix B we present the expressions of the 
gradient and the Hessian needed for the optimization. Since a good initialization is very important for optimizing the complicated 
equation ( ^ ) we will continue to derive an approximate coordinate descent MLE which is computationally simpler, and show its 
connection to CLEAN. 

B. Single source in colored noise 

One simplified approach to solve the MLE (|4|) which leads to good results in LS problems is the deflation approach or 
coordinate descent algorithm. In this approach the sources are extracted one by one and once we have obtained estimates of 
parameters of all sources we iterate the optimization along each parameter fixing the other parameters. Some examples of this 
approach are the alternating projections algorithm [Q, and the estimation of multipath parameters [|4l|]. 

Our basic approach to remove the effect of previously estimated sources will be to lump their contribution into the "noise part" 
of the covariance matrix. This essentially provides an a-posteriori estimate of the noise covariance matrix, before continuing to 
estimate the next source. Hence an important component of the algorithm will be the ML estimation of a single source in colored 
noise with known covariance matrix. In what follows we will derive an approximate ML estimator for this specific case. In order 
to reduce the notational complexity we will restrict ourselves to perfectly calibrated arrays, i.e., = I for all k, and note that 
estimating the calibration parameters will be done in a separate stage as it is currently done in the self calibration scheme. 

In order to reduce computational complexity even further, we would like to perform the source power estimation (conditioned 
on the location parameter) analytically. This will result in a two-dimensional search over the field of view for the location with 
highest likelihood for a point source, a task of moderate complexity. 

Assume that we are given a set of covariance matrices |Rfc : k — 1, . . . , which are sample estimates of where 



Rfc 



Aafe(s)a)!(s) + 



(35) 



and Cfc is the noise covariance matrix (assumed to be known). 
The log-likelihood function is given by: 
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k=l 



C{-ki,... ,Rk\X,s) =-Y,Nk [log|Rfc|+tr (Rk'Rk) 
Substituting (|5|) into we obtain 

CiRi, ... , Rk|A, s) = - ^ A^fc log |Aafc(s)a^(s) + C^l + tr ( (Aak(s)a^(s) + Ck)^' Rk 



(36) 



K 



k=l 



Further manipulation using tr(AB) = tr(BA) and |AB| = |A||B| yields 

£(Ri,... ,Rx|A,s) = -Ef=i^fc [log|AC-iafc(s)a^(s)+I| 

+ log|Cfc| + tr ((AC-iak(s)a^(s) + I)"' C-^Rk 

Since log |Cfc| does not depend on the parameters, the maximum likelihood estimate is given by minimizing 



C'iRi, ... , Rk\X, s) - Nk [log |AC^iafe(s)a^(s) + I| + tr ((AC^ 'ak(s)a;j(s) + I) C^^R^ 



k=l 

After some algebraic manipulations described in the appendix, we obtain that we have to minimize 

K 



£"(Ri,... ,Rk|A,s) =^iVfc 



fe=i 



log(l + Aafe) 



A/3fc 



1 + \ak 



(37) 



where a = a)!(s)Cj, ^afe(s) and/J^ = a)!(s)Cj, ^R^Cj, ^a/j(s). Taking the derivative of the left hand side with respect to A yields 
that the MLE of A is given by solving the equation 



dC" 
dX 



K 



fc=i 



Oik 



Pk 



l + Attfe 1 + Aafc (1 + Aafc)2 



0. 



(38) 



16 



Equation (^8|) is highly non-linear and hard to solve in the general case. We propose to simplify the problem by the following 
procedure. First estimate the source power using each of the covariance matrices separately and then average the estimates. 
Assuming that the ML estimate is consistent the above estimate will still be consistent. If we further assume that the statistical 
behavior of the various estimates is approximately the same (i.e., the Fisher information based on each time observation is almost 
constant) then the estimator is still efficient. This is formulated in the following lemma. 

Lemma VII. 1: Assume that /3(x|A) = J2'^=oo (^11 1-^)- — argmax^ £|| (x|| |A) be the MLE based on the k'th block 

of data. Assume that all J^. are equal say Jfc(A) = J, where is the Fisher information for A^ based on the data x^., and the 
likelihood £|| (x| A). Then A = X^aLi is an asymptotically efficient estimator of A. 

Proof: The Fisher information of the overall likelihood is given by Jk- Thus the CRB on estimating A is given by 

/^^ 1 1 

var{X) > 



Eti Jk KJ 

Since the MLE is asymptotically efficient (in the total number of samples Nk) its asymptotic variance achieves the CRB. On 
the other hand if we estimate A based on the fc'th block we obtain (by its asymptotic efficiency in Nk) that it has a variance given 
by 

var{\k) = 

Jk 

Now combining the various estimates we obtain: 

var{l) = -Ij. 

□ 

By the inequality of the harmonic and arithmetic mean we know that if the Fisher information is different at some time 
instances, then the averaged estimator has a larger variance, but the difference depends on the variation of the J^'s. However this 



degradation in performance gives us a large computational saving. By the additivity of the derivative we obtain from (38 ) that for 
each k the MLE Xk based on is given by 



Equivalently this can be written as 



(39) 



Afe = . h\^_. (40) 



and hence 



K 



Efe=l Nk u=l 



(a^(s)C-iafe(s))^ 



ag(s)C^^(Rfc-CfejC^iafe(s) 
(a«(s)C^iafe(s))2 



(41) 



Now that we have this approximate ML (AML) estimate of A we can plug it into the likelihood function and obtain that we 
have to maximize over the field of view the following: 



s = argmaxX: log (|1 + A(s)afe(.)|) - ^ ^^"f''^"] y (42) 
^ fr[ 1 + A(s)afc(s) 



The proposed coordinate-wise AML estimator is now summarized in table p| . 



^The stopping condition at step 4 can be tested either using a statistic or by comparing the level of the extracted point source to the noise 
level. 

*The final MLE focusing operation can use the same update equations for an alternating coordinate maximization. In this case we use the 
matrix inversion lemma twice: First we add the contribution of the last estimated coordinate to Cfc, and then we subtract from Ck the estimated 
contribution of the coordinate along which we would like to optimize. 
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Alternatively to substituting back into the likelihood we can try to find the direction which maximizes the received power. It is 
interesting to observe that the CLEAN algorithm can be derived as an approximation to this power maximizing estimator To that 
end maximize the power received from direction s, i.e., 



s = argmax A(s). 



(43) 



Using (|41|) we obtain: 



arg max 



K 

■E 



(a«(s)C-iafe(s))2 



(44) 



From (^ we obtain that C^; = AfeBA)! + cr^I, where A contains all the array responses towards the previously estimated 
astronomical sources. Assuming now that the power of the astronomical sources is negligible compared to the noise power (this is 
reasonable in many circumstances) and noting that that ||a(s)|p = p for every s by our normalization, we obtain that sa cr^I 
and that (44) simplifies to (EO). An iterative application of (gO) is exactly the operation of the CLEAN beam removing technique. 



VIII. Conclusions 

In this paper we have presented a parametric approach to radio-astronomical image formation. We have used this approach 
to adapt some known spectral estimators to the astronomical image formation problem. We analyzed the effect of interference 
suppression on imaging and proposed the necessary changes to the imaging step in order to accommodate the spatial filtering pre- 
processing. A new AML algorithm for deconvolution has been presented, and the CLEAN algorithm was derived by approximating 
the ML power estimates. Finally we have presented some simulated images demonstrating some of the ideas presented, and 
demonstrating the possible advantages in the parametric approach that leads to improved resolution. 

The work shows that the design of a new radio-telescope can (and probably should) use phased-array techniques to mitigate 
RFI, but the imaging software will have to be changed accordingly. 

In this paper we have only analyzed LS based deconvolution. In extension of this work [ ]l9| ] we analyze the MEM image forma- 
tion, in the context of adaptive interference suppression, and propose suitable changes to the imaging. The possible applications of 
this work spans beyond the field of radio-astronomy. One possible example is ISAR imaging in the presence of strong interferers. 
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I. 



Appendix 
Appendix A 



In this appendix we derive formula (p?]). To simplify the derivation we omit the explicit dependence on s. Let Rfc 



C4l + AC^^afca^]. hence R^^ = [l + AC^^afca^] 
lemma obtaining: 



To compute [l 



AC J. ^afca^ 



] , we use the matrix inversion 



[I- 



ACj, ^afea"] 



1 + Aa^C^^afc 



f-l-l„ „H 



Using tr(ab'^) — h^a., we obtain: 



tr (Rk 'Rk) = tr (Ck^Rk 



Aa?C,, "'"RkC. 



ak 



1 + AajJCk 'ak 



Since tr ^Ck ^Rk^ is independent of the parameters it can be omitted from the maximization. 

To evaluate log (|I + AC^'a^a^ |) note that the vector C^'afc is an eigenvalue of the rank one matrix AC^'afeaJ! with eigen- 
value AaJ!C^'afc. Therefore it is an eigenvector of I + AC^^'afeaJ! with eigenvalue 1 + Aa)!C^'afc. All the other eigenvalues 
of I + AC^'afea", are 1. Hence since the determinant is the product of the eigenvalues |I + AC^'afca^j = 1 + AaJ!Cj 
Substituting into C we obtain (^7|). 
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II. Appendix B 

In this appendix we present the gradient of the log-likelihood and the Fisher information matrix for the likelihood function 
given in equations (^2|),(|34l). This gives a possibility of obtaining rapidly convergent ML estimation, given a good initialization. 
We will not give the derivation in detail as it is rather standard. Similar derivation for the perfectly calibrated array model case can 
be found e.g., in [|3^]. 

Let the parameter vector be 

■0 = [l,m, A,7i, . . . ,7^] . 

where 1 = [li, . . . ,1^] is the vector of I coordinate of the astronomical point sources, m = [mi, . . . , m^] is the vector of m 
coordinate of the astronomical point sources (and — {li,mi)) are the location coordinates of the i'th astronomical source, 
A = [Ai, . . . , Xd] is the vector of brightness of the sources, and 7^. are the calibration parameters at the fc'th observation epoch. 
The score function is given by 



K 



and 



k=l 



K 



5vec(Rfc) 



/ 5vec(Rfe 



dtp 



(Rj:^(g)R^i)vec(Rfe-Rfe). 



dip 



We now turn to compute ^^^^p^- Equation can be rewritten as: 



vec(Rfe) = [a^(si)rfc (g) rfeafc(si), . . . ,a^{sd)Tl (^Tkak{sd)] X 



Hence we obtain 



and similarly 



gvec(Rfc) 
dX 



= [afe(si)r^(g)rfeafe(si),... ,a^(sd)rfc (g)rfcafe(sd) 



(45) 



(46) 



(47) 



(48) 



9vec (Rk 



^1 ((^)" ^ r,afc(si) + a«(si)r^ ^ r,.^(si) 

Xd ({^f {sd)Tl ® nakisd) + a^(s,)r^ ® r,^(srf) 



(49) 



9vec(RA: 
9m 



{{^f (^d)Tl ® rfea,(s,) + a^(s,)r^ < 



'r,^(s,) 



Finally the derivatives with respect to the calibration parameters are given by: 

d d 



Jn.k 



1=1 



where Si^k is Kronecker's delta, and 7„ ^ are given by (| 
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TABLE I 

The clean algorithm 



/ = 




while If) is not noise-like: 




{£i,mi) = argmaxl£)(^, m) 




A; = lD{£i,mi)/Bo{0,0) 




Id ■= Id - lhBo{l - £i,m- mi) 




J = l + l 


I = Id + Y.1 l^lBsynth{t - ii,m- mi) 



TABLE II 

The clean algorithm with spatial filtering 



Compute R' using ( 


B 


iM = Ek=i4i 


s)R'^afc(s) 


I = 




while I'j^ is not noise-like: 




s; = argmax/^(s) 




Compute B{s, s/) using (|28|) 




A; = Id{si)/B{si,si) 




/^(s) := /^(s) - ^XiBis,si) 




I =1 + 1 


I = Id + J2i ikBsynthi^ - Si) 



TABLE III 

Iterative approximate maximum likelihood 



1. Set the data covariance to be |r^''^ : k = 1, . . . , K^. 

2. Set the noise covariance matrix to be C^!^^ = cj^I. 

3. LetAl°)=Rf -4°). 

4. Until the residual is noise like perform the following: 

(a) Estimate the direction of a point source s; using (42). 

(b) Estimate A; using (pi]). 

(c) Let Cf+') = ) + 7A,a(sOa(sO^ 

(d) Update A^^) = A« - cf 

(e) Update (^C^'^^'*^ using the matrix inversion rank 1 update formula. 

(f) / := / + L 

5. Use the estimated components to initialize a MLE search. 

6. Reconstruct the image using the estimated directions, estimated powers 
and an ideal beam-shape. 






Fig. 2 

Residual INR after spatial filtering. 
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Fig. 3 

Classical dirty beam Bo{£,m), no interference suppression. 
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Fig. 4 

Dirty beam resulting after spatial filtering. Beam B{£, m, 0, 0) for a source at the center of the field of 

view. 
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Fig. 6 

(a) Conventional dirty image; (6) dirty image using MVDR beamforming. The dots represented the 

LOCATIONS OF THE POINT SOURCES MODELED. 
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Fig. 7 

Dirty images of four point sources, (a) no interference; (&) unsuppressed interference (INR 

AFTER SPATIAL FILTERING. 



= 5 dB); (c) 



